df<- read_csv("candy_production.csv")
## Rows: 548 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (1): IPG3113N
## date (1): observation_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
colnames(df)<-c("date", "production")
df$date<-as.Date(df$date, format="%Y-%m-%d")
CandyXTS<- xts(df[-1], df[[1]])
CandyTS <- ts(df$production, start=c(1972,1),end=c(2017,8), frequency=12 )
ts_plot(CandyXTS, title="Candy Production Time Series")
Trend
ts_decompose(CandyTS)
I can see that the trend of the series is fairly flat up until roughly 1983 and then consistently increases up until about 2006. After this, the candy production trend declines a bit and this decline coincides with the 2008 recession, where it would make sense that output would decrease. The trend plot is clearly not linear.
I can also tell that there is strong seasonal pattern. In the next section I will look at seasonality more in depth.
Seasonality
ts_seasonal(CandyTS, type = "all")
The first plot shows the full frequency cycle of the series, which in this case with monthly series is a full year. The plot’s color scale set by the chronological order of the lines (i.e., from dark color for early years and bright colors for the latest years). From this I can see a yearly seasonal trend, where candy production ramps up in October - December.
The second plot shows each frequency unit over time, and in this case, each line represents a month of consumption over time. It looks as though the seasonal pattern remains the same over time, but overall production has increased over the years.
The last plot is box-plot representative of each frequency unit, which allows us to compare the distribution of each frequency unit. Similar to the plots above, I can see that production increases in the final months of the year.
Next I want to use a periodogram to quantitatively evalate seasonality.
p <- periodogram(CandyTS)
max_freq <- p$freq[which.max(p$spec)]
seasonality <- 1/max_freq
seasonality
## [1] 576
There seems to be two major spikes. Looking at the tallest spike, the seasonality comes out to 576. Since the time series data is monthly, 576 units (or months) would be equivalent to 48 years. I also only have a total of 548 data points so I do not believe 576 is capturing an accurate seasonal pattern.
I will look at the second largest spike in the periodogram and see what it suggests.
# Identify the indices of the two largest spectral densities
largest_indices <- order(p$spec, decreasing = TRUE)[1:2]
# Retrieve the corresponding frequencies
largest_freqs <- p$freq[largest_indices]
# Select the second largest frequency
second_max_freq <- largest_freqs[2]
# Estimate the seasonality based on the second maximum frequency
seasonality_2 <- 1/second_max_freq
# Print the seasonality
seasonality_2
## [1] 12
For this, the seasonality is 12, which would suggest yearly patterns. This also supports the plots from the decomposition plot and the seasonality plots.
Autocorrelation
Acf(CandyTS)
The ACF plot depicted above exhibits signs of non-stationarity, as the lines cross the dashed lines, indicating a noticeable correlation that deviates significantly from zero. Furthermore, I know there is seasonality in this time series, indicating a degree of time-dependence and consequently establishing its non-stationary nature due to the existence of autocorrelation.
train<-window(CandyTS, start=c(1972,1), end=c(2009,12))
test<-window(CandyTS, start=c(2010,1))
Average: forecast will be equal to the average of past data
m_mean<-meanf(train, h=92)
accuracy(m_mean, test)
## ME RMSE MAE MPE MAPE MASE
## Training set -1.131747e-15 19.03110 16.087687 -4.017003 17.335216 3.195745
## Test set 4.974573e+00 12.11305 9.751367 3.689583 8.981927 1.937064
## ACF1 Theil's U
## Training set 0.8782726 NA
## Test set 0.7935696 1.670889
Naive: forecast will be equal to the last observation
m_naive<-naive(train, h=92)
accuracy(m_naive, test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.06780 9.34429 6.592331 -0.3756556 6.679489 1.309536
## Test set -11.74155 16.11966 13.450687 -12.4376238 13.805155 2.671917
## ACF1 Theil's U
## Training set 0.2603380 NA
## Test set 0.7935696 2.551853
Seasonal Naive: forecast will be equal to the last observation of same season
m_snaive<-snaive(train, h=92)
accuracy(m_snaive, test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.4044957 6.650535 5.034096 0.1927988 5.360879 1.000000
## Test set 11.1960207 14.003395 12.094347 10.9108145 11.712190 2.402486
## ACF1 Theil's U
## Training set 0.7571979 NA
## Test set 0.6864750 2.028931
Plots
autoplot(train)+
autolayer(m_mean, series="Mean", PI=FALSE)+
autolayer(m_naive, series="Naive", PI=FALSE)+
autolayer(m_snaive, series="Seasonal Naive", PI=FALSE)+
xlab('Year')+ylab('Candy Production')+
ggtitle('Forecasts for Candy Production')+
guides(colour=guide_legend(title='Forecast'))
When evaluated against the test set, the mean method has better RMSE and MAPE values.
Simple exponential: smoothing for level
m_ses<-ses(train, h=92)
accuracy(m_ses, test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.08912664 9.345527 6.59965 -0.349842 6.690203 1.310990
## Test set -11.74160291 16.119700 13.45072 -12.437675 13.805191 2.671924
## ACF1 Theil's U
## Training set 0.2564186 NA
## Test set 0.7935696 2.551859
Holt: smoothing for level and for trend
m_holt<-holt(train, h=92)
accuracy(m_holt, test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.04709578 9.349222 6.602357 -0.392977 6.694943 1.311528
## Test set -13.87178976 17.531921 14.975683 -14.450762 15.326538 2.974850
## ACF1 Theil's U
## Training set 0.2557387 NA
## Test set 0.7832662 2.763515
Holt Winters: smoothing for level, trend and seasonality
m_holtw<-hw(train, seasonal="additive", h=92)
accuracy(m_holtw, test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.03661278 3.910577 2.978740 -0.05128058 3.102872 0.5917130
## Test set 0.18823019 5.550671 4.634408 -0.04621848 4.467333 0.9206039
## ACF1 Theil's U
## Training set 0.1383633 NA
## Test set 0.6976633 0.8265308
ETS: Allows other combinations of trend and seasonal components
Error, Trend, Seasonality
A= additive, M= multiplicative, N= none
m_ets<-forecast(ets(train), h=92)
summary(m_ets)
##
## Forecast method: ETS(A,N,A)
##
## Model Information:
## ETS(A,N,A)
##
## Call:
## ets(y = train)
##
## Smoothing parameters:
## alpha = 0.5772
## gamma = 0.2617
##
## Initial states:
## l = 79.9047
## s = 22.9395 27.0302 19.3452 -5.8776 -9.0329 -9.9271
## -9.6387 -13.234 -15.4527 -10.5926 -3.6046 8.0452
##
## sigma: 3.8803
##
## AIC AICc BIC
## 4044.229 4045.320 4106.066
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0875434 3.820267 2.914016 0.05221298 3.025859 0.5788559
## ACF1
## Training set 0.1020205
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2010 105.65965 100.68685 110.63245 98.05441 113.2649
## Feb 2010 103.83701 98.09518 109.57884 95.05564 112.6184
## Mar 2010 98.53152 92.11214 104.95091 88.71392 108.3491
## Apr 2010 92.76025 85.72830 99.79221 82.00580 103.5147
## May 2010 92.80072 85.20544 100.39599 81.18474 104.4167
## Jun 2010 92.96545 84.84584 101.08507 80.54757 105.3833
## Jul 2010 92.98669 84.37460 101.59878 79.81564 106.1577
## Aug 2010 100.15760 91.07972 109.23548 86.27418 114.0410
## Sep 2010 110.00786 100.48694 119.52877 95.44688 124.5688
## Oct 2010 121.53457 111.59034 131.47880 106.32619 136.7430
## Nov 2010 118.34048 107.99024 128.69073 102.51115 134.1698
## Dec 2010 116.24445 105.08495 127.40394 99.17747 133.3114
## Jan 2011 105.65965 94.13688 117.18242 88.03710 123.2822
## Feb 2011 103.83701 91.96208 115.71195 85.67587 121.9982
## Mar 2011 98.53152 86.31457 110.74847 79.84730 117.2157
## Apr 2011 92.76025 80.21060 105.30990 73.56721 111.9533
## May 2011 92.80072 79.92696 105.67448 73.11200 112.4894
## Jun 2011 92.96545 79.77555 106.15536 72.79324 113.1377
## Jul 2011 92.98669 79.48804 106.48534 72.34229 113.6311
## Aug 2011 100.15760 86.35712 113.95808 79.05158 121.2636
## Sep 2011 110.00786 95.91200 124.10371 88.45010 131.5656
## Oct 2011 121.53457 107.14940 135.91974 99.53435 143.5348
## Nov 2011 118.34048 103.67171 133.00926 95.90653 140.7744
## Dec 2011 116.24445 100.99388 131.49501 92.92072 139.5682
## Jan 2012 105.65965 90.14129 121.17801 81.92636 129.3929
## Feb 2012 103.83701 88.05539 119.61863 79.70111 127.9729
## Mar 2012 98.53152 82.49097 114.57208 73.99961 123.0634
## Apr 2012 92.76025 76.46488 109.05563 67.83862 117.6819
## May 2012 92.80072 76.25444 109.34699 67.49538 118.1061
## Jun 2012 92.96545 76.17203 109.75888 67.28213 118.6488
## Jul 2012 92.98669 75.94970 110.02368 66.93087 119.0425
## Aug 2012 100.15760 82.88048 117.43472 73.73453 126.5807
## Sep 2012 110.00786 92.49390 127.52181 83.22257 136.7931
## Oct 2012 121.53457 103.78694 139.28221 94.39191 148.6772
## Nov 2012 118.34048 100.36221 136.31876 90.84508 145.8359
## Dec 2012 116.24445 97.78841 134.70048 88.01838 144.4705
## Jan 2013 105.65965 86.98172 124.33758 77.09422 134.2251
## Feb 2013 103.83701 84.93979 122.73423 74.93621 132.7378
## Mar 2013 98.53152 79.41753 117.64552 69.29919 127.7639
## Apr 2013 92.76025 73.43191 112.08859 63.20011 122.3204
## May 2013 92.80072 73.26038 112.34105 62.91636 122.6851
## Jun 2013 92.96545 73.21540 112.71550 62.76036 123.1705
## Jul 2013 92.98669 73.02913 112.94425 62.46423 123.5091
## Aug 2013 100.15760 79.99466 120.32054 69.32104 130.9942
## Sep 2013 110.00786 89.64161 130.37410 78.86037 141.1553
## Oct 2013 121.53457 100.96703 142.10212 90.07922 152.9899
## Nov 2013 118.34048 97.57359 139.10738 86.58026 150.1007
## Dec 2013 116.24445 95.06260 137.42629 83.84961 148.6393
## Jan 2014 105.65965 84.28419 127.03512 72.96870 138.3506
## Feb 2014 103.83701 82.26967 125.40436 70.85261 136.8214
## Mar 2014 98.53152 76.77399 120.28905 65.25625 131.8068
## Apr 2014 92.76025 70.81418 114.70632 59.19663 126.3239
## May 2014 92.80072 70.66771 114.93372 58.95120 126.6502
## Jun 2014 92.96545 70.64708 115.28383 58.83244 127.0985
## Jul 2014 92.98669 70.48447 115.48891 58.57252 127.4009
## Aug 2014 100.15760 77.47303 122.84217 65.46455 134.8507
## Sep 2014 110.00786 87.14239 132.87332 75.03814 144.9776
## Oct 2014 121.53457 98.48963 144.57951 86.29037 156.7788
## Nov 2014 118.34048 95.11745 141.56352 82.82392 153.8570
## Dec 2014 116.24445 92.64961 139.83928 80.15927 152.3296
## Jan 2015 105.65965 81.89085 129.42845 69.30841 142.0109
## Feb 2015 103.83701 79.89551 127.77852 67.22164 140.4524
## Mar 2015 98.53152 74.41854 122.64450 61.65391 135.4091
## Apr 2015 92.76025 68.47701 117.04349 55.62225 129.8983
## May 2015 92.80072 68.34841 117.25302 55.40414 130.1973
## Jun 2015 92.96545 68.34523 117.58568 55.31207 130.6188
## Jul 2015 92.98669 68.19969 117.77369 55.07825 130.8951
## Aug 2015 100.15760 75.20494 125.11026 61.99580 138.3194
## Sep 2015 110.00786 84.89063 135.12508 71.59438 148.4213
## Oct 2015 121.53457 96.25385 146.81529 82.87105 160.1981
## Nov 2015 118.34048 92.89732 143.78365 79.42852 157.2524
## Dec 2015 116.24445 90.46148 142.02741 76.81280 155.6761
## Jan 2016 105.65965 79.71738 131.60192 65.98437 145.3349
## Feb 2016 103.83701 77.73641 129.93761 63.91959 143.7544
## Mar 2016 98.53152 72.27355 124.78949 58.37342 138.6896
## Apr 2016 92.76025 66.34584 119.17466 52.36290 133.1576
## May 2016 92.80072 66.23079 119.37064 52.16552 133.4359
## Jun 2016 92.96545 66.24092 119.68999 52.09381 133.8371
## Jul 2016 92.98669 66.10843 119.86495 51.87995 134.0934
## Aug 2016 100.15760 73.12650 127.18870 58.81710 141.4981
## Sep 2016 110.00786 82.82477 137.19095 68.43491 151.5808
## Oct 2016 121.53457 94.20034 148.86880 79.73047 163.3387
## Nov 2016 118.34048 90.85594 145.82503 76.30650 160.3745
## Dec 2016 116.24445 88.44504 144.04385 73.72893 158.7600
## Jan 2017 105.65965 77.71244 133.60686 62.91808 148.4012
## Feb 2017 103.83701 75.74277 131.93126 60.87057 146.8035
## Mar 2017 98.53152 70.29101 126.77203 55.34139 141.7217
## Apr 2017 92.76025 64.37423 121.14628 49.34757 136.1729
## May 2017 92.80072 64.26992 121.33151 49.16663 136.4348
## Jun 2017 92.96545 64.29062 121.64029 49.11108 136.8198
## Jul 2017 92.98669 64.16854 121.80485 48.91313 137.0603
## Aug 2017 100.15760 71.19683 129.11836 55.86593 144.4493
accuracy(m_ets, test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0875434 3.820267 2.914016 0.05221298 3.025859 0.5788559
## Test set 1.5358342 6.177580 4.964258 1.20188945 4.749965 0.9861270
## ACF1 Theil's U
## Training set 0.1020205 NA
## Test set 0.7363525 0.9024974
autoplot(train)+
autolayer(m_ses, series="Simple Exponential", PI=FALSE)+
autolayer(m_holt, series="Holt Method", PI=FALSE)+
autolayer(m_holtw, series="Holt_Winters", PI=FALSE)+
autolayer(m_ets, series="ETS", PI=FALSE)+
xlab('Month/Year')+ylab('Candy Production')+
ggtitle('Forecasts for Candy Production')+
guides(colour=guide_legend(title='Forecast'))
When evaluated against the test set, the Holt-Winters has better RMSE and MAPE values.
autoarima <- auto.arima(train, allowdrift=F)
autoarima
## Series: train
## ARIMA(4,1,1)(2,1,2)[12]
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 sar1 sar2 sma1
## 0.6373 0.0778 0.2031 -0.0884 -0.9719 -0.4630 0.0364 -0.1129
## s.e. 0.0519 0.0572 0.0580 0.0499 0.0226 0.4073 0.1150 0.4037
## sma2
## -0.4569
## s.e. 0.3127
##
## sigma^2 = 13.59: log likelihood = -1206.69
## AIC=2433.38 AICc=2433.89 BIC=2474.32
arima=Arima(train, order=c(4,1,1),
seasonal=list(order=c(2,1,2), period=12))
arima
## Series: train
## ARIMA(4,1,1)(2,1,2)[12]
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 sar1 sar2 sma1
## 0.6373 0.0778 0.2031 -0.0884 -0.9719 -0.4630 0.0364 -0.1129
## s.e. 0.0519 0.0572 0.0580 0.0499 0.0226 0.4073 0.1150 0.4037
## sma2
## -0.4569
## s.e. 0.3127
##
## sigma^2 = 13.59: log likelihood = -1206.69
## AIC=2433.38 AICc=2433.89 BIC=2474.32
arima_f =forecast(arima, h=92)
checkresiduals(arima)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(4,1,1)(2,1,2)[12]
## Q* = 32.667, df = 15, p-value = 0.005218
##
## Model df: 9. Total lags used: 24
Since the p-value of the Ljung-Box test is 0.005218, we have sufficient evidence to reject the null hypothesis and conclude that autocorrelation is present.
residuals <- residuals(arima)
Acf(residuals)
p_resid <- periodogram(residuals)
The values in the ACF plot are mostly within the dashed thresholds, and the periodogram shows all frequency levels present so it seems like the residuals are mostly white noise.
accuracy(arima_f, test)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.129378 3.597095 2.701688 -0.1833357 2.778181 0.5366779
## Test set 12.415226 15.640364 12.770918 11.6386579 12.011212 2.5368841
## ACF1 Theil's U
## Training set 0.005813222 NA
## Test set 0.866520116 2.235618
autoplot(train)+
autolayer(arima_f, series="ARIMA", PI=FALSE)+
xlab('Month/Year')+ylab('Candy Production')+
ggtitle('Forecasts for Candy Production')+
guides(colour=guide_legend(title='Forecast'))
Model Comparisons
library(tibble)
# Create forecasts for each model
forecasts <- list(
Mean = m_mean,
Naive = m_naive,
Seasonal_Naive = m_snaive,
ETS = m_ets,
Holt = m_holt,
Holt_Winters = m_holtw,
Simple_Exponential = m_ses,
ARIMA = arima_f
)
# Calculate RMSE for each forecast
rmse_values <- sapply(forecasts, function(forecast) {
accuracy_values <- accuracy(forecast, test)
rmse <- accuracy_values[2, "RMSE"]
return(rmse)
})
# Create a table of RMSE values
model_names <- names(forecasts)
rmse_table <- tibble(Model = model_names, RMSE = rmse_values)
# Sort the table by RMSE in ascending order
rmse_table <- rmse_table[order(rmse_table$RMSE), ]
# Calculate MAPE for each forecast
mape_values <- sapply(forecasts, function(forecast) {
accuracy_values <- accuracy(forecast, test)
mape <- accuracy_values[2, "MAPE"]
return(mape)
})
# Create a table of MAPE values
model_names <- names(forecasts)
mape_table <- tibble(Model = model_names, MAPE = mape_values)
# Sort the table by MAPE in ascending order
mape_table <- mape_table[order(mape_table$MAPE), ]
# Combine RMSE and MAPE tables
combined_table <- left_join(rmse_table, mape_table, by = "Model")
# Format the combined table using kable
kable(combined_table, align = c("l", "r", "r"), caption = "RMSE and MAPE values for each forecast model")
| Model | RMSE | MAPE |
|---|---|---|
| Holt_Winters | 5.55067 | 4.467333 |
| ETS | 6.17758 | 4.749965 |
| Mean | 12.11305 | 8.981927 |
| Seasonal_Naive | 14.00339 | 11.712190 |
| ARIMA | 15.64036 | 12.011212 |
| Naive | 16.11966 | 13.805155 |
| Simple_Exponential | 16.11970 | 13.805191 |
| Holt | 17.53192 | 15.326538 |
After Seasonal ARIMA, the best RMSE and MAPE was found with the Holt-Winters method. For future analysis, may be a good option to work with ensemble methods to combine the predictions of multiple models.